Query expansion and dimensionality reduction: Notions of optimality in Rocchio relevance feedback and latent semantic indexing

نویسنده

  • Miles Efron
چکیده

Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method’s basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI’s and Rocchio’s notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI’s motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet, which are widely used by experienced and inexperienced users. Constructing an adequate query, which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method...

متن کامل

Dirichlet Mixtures for Query Estimation in Information Retrieval

Treated as small samples of text, user queries require smoothing to better estimate the probabilities of their true model. Traditional techniques to perform this smoothing include automatic query expansion and local feedback. This paper applies the bioinformatics smoothing technique, Dirichlet mixtures, to the task of query estimation. We discuss Dirichlet mixtures’ relation to relevance models...

متن کامل

Rocchio’s Relevance Feedback Algorithm in basic vector comparison and LSI models (ROCKNROLL)

Relevance feedback is a query reformulation technique that improves the effectiveness of information retrieval. The basic idea is to do an initial query, get feedback from the user as to what documents he or she considers relevant and then use this information to supplement and enrich the user’s initial query, allowing greater retrieval performance [2]. Relevance feedback is an iterative proces...

متن کامل

Latent Semantic Indexing with selective Query Expansion

This article describes our experiments during participation in the Legal Track of the 2011 Text Retrieval Conference. We incorporated machine learning, via selective query expansion, into our existing EDLSI system. We also were able to expand the number of dimensions used within our EDLSI system. Our results show that EDLSI is an effective technique for E-Discovery. We also have shown that sele...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2008